Check overall methods performance:

Best set proposals

Top 3 sets which achived the best accuracy in training, testing and validation:

(metaindex = harmonic mean of all 3 accuracy metrics)

                        metaindex                  method
ElasticNet              0.8188778              ElasticNet
Mystepwise_glm_binomial 0.8182966 Mystepwise_glm_binomial
AUC_MDL                 0.8161237                 AUC_MDL
                                                                                                                                                                                            miRy
ElasticNet                                           Class ~ hsa.let.7b.5p + hsa.miR.30d.5p + hsa.miR.320b + hsa.miR.19b.3p + hsa.miR.20b.5p + hsa.miR.1304.3p + hsa.miR.139.3p + hsa.miR.375.3p
Mystepwise_glm_binomial                           Class ~ hsa.miR.19b.3p + hsa.miR.4433a.3p + hsa.let.7b.5p + hsa.miR.106b.5p + hsa.miR.1304.3p + hsa.miR.30d.5p + hsa.miR.320b + hsa.miR.375.3p
AUC_MDL                 Class ~ hsa.miR.20b.5p + hsa.miR.19b.3p + hsa.let.7b.5p + hsa.miR.320b + hsa.miR.30d.5p + hsa.miR.139.3p + hsa.miR.17.5p + hsa.miR.182.5p + hsa.miR.421 + hsa.miR.375.3p

Performance of those signatures:

Overlap of those signatures:

$`ElasticNet:AUC_MDL`
[1] "hsa.miR.20b.5p" "hsa.miR.139.3p"

$Mystepwise_glm_binomial
[1] "hsa.miR.4433a.3p" "hsa.miR.106b.5p" 

$`ElasticNet:Mystepwise_glm_binomial`
[1] "hsa.miR.1304.3p"

$`ElasticNet:Mystepwise_glm_binomial:AUC_MDL`
[1] "hsa.let.7b.5p"  "hsa.miR.30d.5p" "hsa.miR.320b"   "hsa.miR.19b.3p" "hsa.miR.375.3p"

$AUC_MDL
[1] "hsa.miR.17.5p"  "hsa.miR.182.5p" "hsa.miR.421"   

Top 3 sets which achived the best accuracy in testing and validation:

(metaindex = mean of 2 accuracy metrics)

                         metaindex                   method
feseR_combineFS_RF_SMOTE 0.7945796 feseR_combineFS_RF_SMOTE
fwrap                    0.7928404                    fwrap
AUC_MDL                  0.7927240                  AUC_MDL
                                                                                                                                                                                             miRy
feseR_combineFS_RF_SMOTE                                Class ~ hsa.miR.320b + hsa.let.7b.5p + hsa.miR.19b.3p + hsa.miR.20b.5p + hsa.miR.30d.5p + hsa.miR.139.3p + hsa.miR.17.5p + hsa.miR.375.3p
fwrap                                                                                                                   Class ~ hsa.miR.1273h.3p + hsa.miR.182.5p + hsa.miR.20b.5p + hsa.miR.320b
AUC_MDL                  Class ~ hsa.miR.20b.5p + hsa.miR.19b.3p + hsa.let.7b.5p + hsa.miR.320b + hsa.miR.30d.5p + hsa.miR.139.3p + hsa.miR.17.5p + hsa.miR.182.5p + hsa.miR.421 + hsa.miR.375.3p

Performance of those signatures:

Overlap of those signatures:

$`fwrap:AUC_MDL`
[1] "hsa.miR.182.5p"

$`feseR_combineFS_RF_SMOTE:AUC_MDL`
[1] "hsa.let.7b.5p"  "hsa.miR.19b.3p" "hsa.miR.30d.5p" "hsa.miR.139.3p" "hsa.miR.17.5p"  "hsa.miR.375.3p"

$`feseR_combineFS_RF_SMOTE:fwrap:AUC_MDL`
[1] "hsa.miR.320b"   "hsa.miR.20b.5p"

$fwrap
[1] "hsa.miR.1273h.3p"

$AUC_MDL
[1] "hsa.miR.421"

Top 3 sets which achived the best sensitivity and specificity in validation:

(metaindex = mean of sensivitiy and specificity in validation dataset)

                         metaindex                   method
fwrap                    0.5701070                    fwrap
feseR_combineFS_RF_SMOTE 0.5648820 feseR_combineFS_RF_SMOTE
AUC_MDLSMOTE             0.5583661             AUC_MDLSMOTE
                                                                                                                                                                                             miRy
fwrap                                                                                                                   Class ~ hsa.miR.1273h.3p + hsa.miR.182.5p + hsa.miR.20b.5p + hsa.miR.320b
feseR_combineFS_RF_SMOTE                                Class ~ hsa.miR.320b + hsa.let.7b.5p + hsa.miR.19b.3p + hsa.miR.20b.5p + hsa.miR.30d.5p + hsa.miR.139.3p + hsa.miR.17.5p + hsa.miR.375.3p
AUC_MDLSMOTE             Class ~ hsa.miR.20b.5p + hsa.miR.19b.3p + hsa.let.7b.5p + hsa.miR.320b + hsa.miR.139.3p + hsa.miR.30d.5p + hsa.miR.17.5p + hsa.miR.182.5p + hsa.miR.421 + hsa.miR.375.3p

Performance of those signatures:

Overlap of those signatures:

$AUC_MDLSMOTE
[1] "hsa.miR.421"

$`fwrap:AUC_MDLSMOTE`
[1] "hsa.miR.182.5p"

$`feseR_combineFS_RF_SMOTE:AUC_MDLSMOTE`
[1] "hsa.let.7b.5p"  "hsa.miR.19b.3p" "hsa.miR.30d.5p" "hsa.miR.139.3p" "hsa.miR.17.5p"  "hsa.miR.375.3p"

$`fwrap:feseR_combineFS_RF_SMOTE:AUC_MDLSMOTE`
[1] "hsa.miR.20b.5p" "hsa.miR.320b"  

$fwrap
[1] "hsa.miR.1273h.3p"

General analysis

Overfitting analysis:

This is by default performed for top 6 sets which achived the best accuracy in training, testing and validation.

Relationship between accuracy on testing and validation sets:

For top 6 methods.

Best feature set

By default we choose the best performing set which achived the best mean accuracy in training, testing and validation.

Best set:

           metaindex     method
ElasticNet 0.8188778 ElasticNet
                                                                                                                                                  miRy
ElasticNet Class ~ hsa.let.7b.5p + hsa.miR.30d.5p + hsa.miR.320b + hsa.miR.19b.3p + hsa.miR.20b.5p + hsa.miR.1304.3p + hsa.miR.139.3p + hsa.miR.375.3p

DE of selected features:

This should serve as a sanity check.

               miR     log2FC      p-value   p-value BH
3    hsa.let.7b.5p -0.5614259 1.065803e-22 6.750086e-22
5   hsa.miR.30d.5p -0.5107887 2.985520e-16 1.134498e-15
4     hsa.miR.320b  0.7320847 6.053835e-17 2.875571e-16
2   hsa.miR.19b.3p -0.9074407 8.427626e-24 8.006244e-23
1   hsa.miR.20b.5p -1.2362520 4.093534e-25 7.777715e-24
17 hsa.miR.1304.3p  0.5522948 5.913518e-04 6.609226e-04
7   hsa.miR.139.3p  0.7180460 9.716613e-15 2.637366e-14
10  hsa.miR.375.3p  1.0764252 1.141904e-09 2.169618e-09

Exploratory analysis best set:

Best classifiers

Based on benchmark results. You could achive better model by further tuning it. Metaindex - mean accuracy on training, testing and validation datasets. Metaindex2 - mean accuracy on testing and validation datasets only.

                             Name               ID Modelling Method         Selection Method     Train ROC AUC         Train Acc
153 C5.0+RandomForestRFESMOTE_sig 1661466163.44172             C5.0 RandomForestRFESMOTE_sig                 1                 1
134 C5.0+feseR_combineFS_RF_SMOTE 1661466123.98702             C5.0 feseR_combineFS_RF_SMOTE 0.992423055838358 0.971238938053097
148              C5.0+SU_MDLSMOTE 1661466152.72664             C5.0              SU_MDLSMOTE                 1                 1
120   rf+RandomForestRFESMOTE_sig 1661466089.46937               rf RandomForestRFESMOTE_sig                 1                 1
112                rf+sigtopSMOTE 1661466075.12167               rf              sigtopSMOTE                 1                 1
145              C5.0+sigtopSMOTE 1661466145.20905             C5.0              sigtopSMOTE 0.999921685331663 0.995575221238938
131                 rf+ElasticNet 1661466109.38676               rf               ElasticNet                 1                 1
143                   C5.0+sigtop 1661466140.78412             C5.0                   sigtop 0.999892732636095 0.994884910485934
152      C5.0+RandomForestRFE_sig 1661466160.63635             C5.0      RandomForestRFE_sig                 1                 1
151          C5.0+RandomForestRFE 1661466157.97191             C5.0          RandomForestRFE                 1                 1
147             C5.0+AUC_MDLSMOTE 1661466149.76921             C5.0             AUC_MDLSMOTE                 1                 1
100         rf+feseR_combineFS_RF 1661466053.67859               rf       feseR_combineFS_RF                 1                 1
119        rf+RandomForestRFE_sig 1661466087.65714               rf      RandomForestRFE_sig                 1                 1
164               C5.0+ElasticNet 1661466183.65821             C5.0               ElasticNet 0.999570930544382 0.989769820971867
101   rf+feseR_combineFS_RF_SMOTE 1661466055.33465               rf feseR_combineFS_RF_SMOTE                 1                 1
146               C5.0+topFCSMOTE 1661466147.08534             C5.0               topFCSMOTE 0.999706319993735 0.988938053097345
108                    rf+AUC_MDL 1661466067.82423               rf                  AUC_MDL                 1                 1
111                      rf+topFC 1661466073.03387               rf                    topFC                 1                 1
129                      rf+spFSR 1661466105.67747               rf                    spFSR                 1                 1
118            rf+RandomForestRFE 1661466085.79472               rf          RandomForestRFE                 1                 1
             Test Acc         Valid Acc         Metaindex
153 0.884615384615385  0.78030303030303 0.879252752690833
134 0.907692307692308 0.765151515151515 0.872539853809606
148 0.830769230769231 0.803030303030303 0.869820686860501
120 0.884615384615385 0.757575757575758 0.869455645161291
112 0.884615384615385 0.757575757575758 0.869455645161291
145 0.884615384615385 0.757575757575758 0.868337155321886
131 0.869230769230769 0.765151515151515  0.86771078841329
143 0.861538461538462 0.772727272727273 0.867058708758512
152 0.853846153846154 0.772727272727273 0.865728704694908
151 0.853846153846154 0.772727272727273 0.865728704694908
147 0.884615384615385 0.742424242424242 0.862720081653483
100 0.884615384615385 0.742424242424242 0.862720081653483
119 0.861538461538462 0.757575757575758  0.86189205828032
164 0.869230769230769 0.757575757575758 0.861876183829079
101 0.846153846153846 0.765151515151515 0.859907120743034
146 0.861538461538462 0.757575757575758 0.859131139911525
108 0.869230769230769 0.742424242424242 0.857784663051898
111 0.846153846153846 0.757575757575758 0.856697819314642
129 0.853846153846154              0.75 0.856041131105398
118 0.853846153846154              0.75 0.856041131105398

Best classifier for metaindex:

Metrics on training set:

Confusion Matrix and Statistics

          Reference
Prediction Control Case
   Control     165    0
   Case          0  226
                                     
               Accuracy : 1          
                 95% CI : (0.9906, 1)
    No Information Rate : 0.578      
    P-Value [Acc > NIR] : < 2.2e-16  
                                     
                  Kappa : 1          
                                     
 Mcnemar's Test P-Value : NA         
                                     
            Sensitivity : 1.000      
            Specificity : 1.000      
         Pos Pred Value : 1.000      
         Neg Pred Value : 1.000      
             Prevalence : 0.578      
         Detection Rate : 0.578      
   Detection Prevalence : 0.578      
      Balanced Accuracy : 1.000      
                                     
       'Positive' Class : Case       
                                     
Model file: models/1661466163.44172.RDS

Call:
roc.formula(formula = train$Class ~ predtrain_y)

Data: predtrain_y in 165 controls (train$Class Control) < 226 cases (train$Class Case).
Area under the curve: 1
95% CI: 1-1 (DeLong)

Metrics on testing set:

Confusion Matrix and Statistics

          Reference
Prediction Control Case
   Control      49    9
   Case          6   66
                                         
               Accuracy : 0.8846         
                 95% CI : (0.8168, 0.934)
    No Information Rate : 0.5769         
    P-Value [Acc > NIR] : 1.721e-14      
                                         
                  Kappa : 0.7653         
                                         
 Mcnemar's Test P-Value : 0.6056         
                                         
            Sensitivity : 0.8800         
            Specificity : 0.8909         
         Pos Pred Value : 0.9167         
         Neg Pred Value : 0.8448         
             Prevalence : 0.5769         
         Detection Rate : 0.5077         
   Detection Prevalence : 0.5538         
      Balanced Accuracy : 0.8855         
                                         
       'Positive' Class : Case           
                                         

Metrics on validation set:

Confusion Matrix and Statistics

          Reference
Prediction Control Case
   Control      59    5
   Case         24   44
                                       
               Accuracy : 0.7803       
                 95% CI : (0.7, 0.8477)
    No Information Rate : 0.6288       
    P-Value [Acc > NIR] : 0.0001367    
                                       
                  Kappa : 0.564        
                                       
 Mcnemar's Test P-Value : 0.0008302    
                                       
            Sensitivity : 0.8980       
            Specificity : 0.7108       
         Pos Pred Value : 0.6471       
         Neg Pred Value : 0.9219       
             Prevalence : 0.3712       
         Detection Rate : 0.3333       
   Detection Prevalence : 0.5152       
      Balanced Accuracy : 0.8044       
                                       
       'Positive' Class : Case         
                                       

This is the end. Timestamp of the analysis:


[2022-08-26 00:32:05 | pid:19469] [OmicSelector: TASK COMPLETED]